-
Notifications
You must be signed in to change notification settings - Fork 806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement partition compaction grouper #6172
base: master
Are you sure you want to change the base?
Implement partition compaction grouper #6172
Conversation
Signed-off-by: Alex Le <[email protected]>
Signed-off-by: Alex Le <[email protected]>
Signed-off-by: Alex Le <[email protected]>
Signed-off-by: Alex Le <[email protected]>
Signed-off-by: Alex Le <[email protected]>
Signed-off-by: Alex Le <[email protected]>
Overall looking good to me. Just a few comments. For user to migrate. Would it just work changing the configuration and deploying? I would imagine yes as we would not find partition data to any block and treat them all as partitionId 0. Correct? |
Yes. Switching back and forth between partitioning and non partitioning should not cause any issue. At most, the largest time range block would be recompacted one more time. |
How it works while deployment is happening? Because we can have compactors creating blocks with partition and compactors creating others without and they are seeing different visit markers? Would it create duplicate compaction while deployment is happening? |
If both are compacting the largest time range blocks, it would create duplicate blocks. For any lower level blocks, it would be compacted into higher level properly after deployment. |
Signed-off-by: Alex Le <[email protected]>
What this PR does:
This PR implements partition compaction grouper.
Introduced new files for partition compaction:
partitionedGroupID
in the file is unique for particular time range.Here is high level algorithm of partition compaction grouper:
Introduced
meta_extensions
to save partition information of result block in meta.json. This infomation can be used to better assign block to proper partition in the next round of compaction.Which issue(s) this PR fixes:
NA
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]